15:05
2026-06-03
pytorch.org
machine-learning
Using Muon Optimizer with DeepSpeed
DeepSpeed has integrated the Muon Optimizer, a memory-efficient optimizer that uses a single momentum buffer and Newton-Schulz orthogonalization to improve training convergence, particularly for 2D weโฆ